Task

  1. Develop a Shiny app that visualizes the progression of the 2019-20 Global Coronavirus Outbreak.

  2. If you are ambitious, bring in visualization of other types of data, e.g., stock market, tweets, contrast with previous outbreak (e.g., 2003 SARS, 2009 H1N1, 2012 MERS, 2014 Ebola), prediction from your statistical/epidemiological model, and so on.

  3. Publish your Shiny app to https://www.shinyapps.io and share the link.

  4. (Optinal) Submit your app to the 2020 Shiny Contest (deadline 20 March 2020 at 5pm ET).

Below is some data exploration, which may help you get started. You do not have to use it. Note below code may change frequently.

Prerequisite

Import JHU SSE data on GitHub into R

Since Feb 11, JHU SSE is hosting 2019 nCoV data on GitHub repo: https://github.com/CSSEGISandData/2019-nCoV.

Let’s import the time series data directly from the csv file on GitHub:

(confirmed <- read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv"))
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   `Province/State` = col_character(),
##   `Country/Region` = col_character()
## )
## See spec(...) for full column specifications.
## # A tibble: 94 x 39
##    `Province/State` `Country/Region`   Lat  Long `1/22/20` `1/23/20` `1/24/20`
##    <chr>            <chr>            <dbl> <dbl>     <dbl>     <dbl>     <dbl>
##  1 Anhui            Mainland China    31.8  117.         1         9        15
##  2 Beijing          Mainland China    40.2  116.        14        22        36
##  3 Chongqing        Mainland China    30.1  108.         6         9        27
##  4 Fujian           Mainland China    26.1  118.         1         5        10
##  5 Gansu            Mainland China    36.1  104.         0         2         2
##  6 Guangdong        Mainland China    23.3  113.        26        32        53
##  7 Guangxi          Mainland China    23.8  109.         2         5        23
##  8 Guizhou          Mainland China    26.8  107.         1         3         3
##  9 Hainan           Mainland China    19.2  110.         4         5         8
## 10 Hebei            Mainland China    38.0  115.         1         1         2
## # … with 84 more rows, and 32 more variables: `1/25/20` <dbl>, `1/26/20` <dbl>,
## #   `1/27/20` <dbl>, `1/28/20` <dbl>, `1/29/20` <dbl>, `1/30/20` <dbl>,
## #   `1/31/20` <dbl>, `2/1/20` <dbl>, `2/2/20` <dbl>, `2/3/20` <dbl>,
## #   `2/4/20` <dbl>, `2/5/20` <dbl>, `2/6/20` <dbl>, `2/7/20` <dbl>,
## #   `2/8/20` <dbl>, `2/9/20` <dbl>, `2/10/20` <dbl>, `2/11/20` <dbl>,
## #   `2/12/20` <dbl>, `2/13/20` <dbl>, `2/14/20` <dbl>, `2/15/20` <dbl>,
## #   `2/16/20` <dbl>, `2/17/20` <dbl>, `2/18/20` <dbl>, `2/19/20` <dbl>,
## #   `2/20/20` <dbl>, `2/21/20` <dbl>, `2/22/20` <dbl>, `2/23/20` <dbl>,
## #   `2/24/20` <dbl>, `2/25/20` <dbl>
(recovered <- read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Recovered.csv"))
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   `Province/State` = col_character(),
##   `Country/Region` = col_character()
## )
## See spec(...) for full column specifications.
## # A tibble: 94 x 39
##    `Province/State` `Country/Region`   Lat  Long `1/22/20` `1/23/20` `1/24/20`
##    <chr>            <chr>            <dbl> <dbl>     <dbl>     <dbl>     <dbl>
##  1 Anhui            Mainland China    31.8  117.         0         0         0
##  2 Beijing          Mainland China    40.2  116.         0         0         1
##  3 Chongqing        Mainland China    30.1  108.         0         0         0
##  4 Fujian           Mainland China    26.1  118.         0         0         0
##  5 Gansu            Mainland China    36.1  104.         0         0         0
##  6 Guangdong        Mainland China    23.3  113.         0         2         2
##  7 Guangxi          Mainland China    23.8  109.         0         0         0
##  8 Guizhou          Mainland China    26.8  107.         0         0         0
##  9 Hainan           Mainland China    19.2  110.         0         0         0
## 10 Hebei            Mainland China    38.0  115.         0         0         0
## # … with 84 more rows, and 32 more variables: `1/25/20` <dbl>, `1/26/20` <dbl>,
## #   `1/27/20` <dbl>, `1/28/20` <dbl>, `1/29/20` <dbl>, `1/30/20` <dbl>,
## #   `1/31/20` <dbl>, `2/1/20` <dbl>, `2/2/20` <dbl>, `2/3/20` <dbl>,
## #   `2/4/20` <dbl>, `2/5/20` <dbl>, `2/6/20` <dbl>, `2/7/20` <dbl>,
## #   `2/8/20` <dbl>, `2/9/20` <dbl>, `2/10/20` <dbl>, `2/11/20` <dbl>,
## #   `2/12/20` <dbl>, `2/13/20` <dbl>, `2/14/20` <dbl>, `2/15/20` <dbl>,
## #   `2/16/20` <dbl>, `2/17/20` <dbl>, `2/18/20` <dbl>, `2/19/20` <dbl>,
## #   `2/20/20` <dbl>, `2/21/20` <dbl>, `2/22/20` <dbl>, `2/23/20` <dbl>,
## #   `2/24/20` <dbl>, `2/25/20` <dbl>
(death <- read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Deaths.csv"))
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   `Province/State` = col_character(),
##   `Country/Region` = col_character()
## )
## See spec(...) for full column specifications.
## # A tibble: 94 x 39
##    `Province/State` `Country/Region`   Lat  Long `1/22/20` `1/23/20` `1/24/20`
##    <chr>            <chr>            <dbl> <dbl>     <dbl>     <dbl>     <dbl>
##  1 Anhui            Mainland China    31.8  117.         0         0         0
##  2 Beijing          Mainland China    40.2  116.         0         0         0
##  3 Chongqing        Mainland China    30.1  108.         0         0         0
##  4 Fujian           Mainland China    26.1  118.         0         0         0
##  5 Gansu            Mainland China    36.1  104.         0         0         0
##  6 Guangdong        Mainland China    23.3  113.         0         0         0
##  7 Guangxi          Mainland China    23.8  109.         0         0         0
##  8 Guizhou          Mainland China    26.8  107.         0         0         0
##  9 Hainan           Mainland China    19.2  110.         0         0         0
## 10 Hebei            Mainland China    38.0  115.         0         1         1
## # … with 84 more rows, and 32 more variables: `1/25/20` <dbl>, `1/26/20` <dbl>,
## #   `1/27/20` <dbl>, `1/28/20` <dbl>, `1/29/20` <dbl>, `1/30/20` <dbl>,
## #   `1/31/20` <dbl>, `2/1/20` <dbl>, `2/2/20` <dbl>, `2/3/20` <dbl>,
## #   `2/4/20` <dbl>, `2/5/20` <dbl>, `2/6/20` <dbl>, `2/7/20` <dbl>,
## #   `2/8/20` <dbl>, `2/9/20` <dbl>, `2/10/20` <dbl>, `2/11/20` <dbl>,
## #   `2/12/20` <dbl>, `2/13/20` <dbl>, `2/14/20` <dbl>, `2/15/20` <dbl>,
## #   `2/16/20` <dbl>, `2/17/20` <dbl>, `2/18/20` <dbl>, `2/19/20` <dbl>,
## #   `2/20/20` <dbl>, `2/21/20` <dbl>, `2/22/20` <dbl>, `2/23/20` <dbl>,
## #   `2/24/20` <dbl>, `2/25/20` <dbl>

Tidy data

I want to tidy data into the long format. pivot_longer is the modern version of gather function in dplyr.

confirmed_long <- confirmed %>%
  pivot_longer(-(`Province/State`:Long), 
               names_to = "Date", 
               values_to = "confirmed") %>%
  mutate(Date = (mdy(Date))) # convert string to date-time
confirmed_long
## # A tibble: 3,290 x 6
##    `Province/State` `Country/Region`   Lat  Long Date       confirmed
##    <chr>            <chr>            <dbl> <dbl> <date>         <dbl>
##  1 Anhui            Mainland China    31.8  117. 2020-01-22         1
##  2 Anhui            Mainland China    31.8  117. 2020-01-23         9
##  3 Anhui            Mainland China    31.8  117. 2020-01-24        15
##  4 Anhui            Mainland China    31.8  117. 2020-01-25        39
##  5 Anhui            Mainland China    31.8  117. 2020-01-26        60
##  6 Anhui            Mainland China    31.8  117. 2020-01-27        70
##  7 Anhui            Mainland China    31.8  117. 2020-01-28       106
##  8 Anhui            Mainland China    31.8  117. 2020-01-29       152
##  9 Anhui            Mainland China    31.8  117. 2020-01-30       200
## 10 Anhui            Mainland China    31.8  117. 2020-01-31       237
## # … with 3,280 more rows
recovered_long <- recovered %>%
  pivot_longer(-(`Province/State`:Long), 
               names_to = "Date", 
               values_to = "recovered") %>%
  mutate(Date = mdy(Date))
recovered_long
## # A tibble: 3,290 x 6
##    `Province/State` `Country/Region`   Lat  Long Date       recovered
##    <chr>            <chr>            <dbl> <dbl> <date>         <dbl>
##  1 Anhui            Mainland China    31.8  117. 2020-01-22         0
##  2 Anhui            Mainland China    31.8  117. 2020-01-23         0
##  3 Anhui            Mainland China    31.8  117. 2020-01-24         0
##  4 Anhui            Mainland China    31.8  117. 2020-01-25         0
##  5 Anhui            Mainland China    31.8  117. 2020-01-26         0
##  6 Anhui            Mainland China    31.8  117. 2020-01-27         0
##  7 Anhui            Mainland China    31.8  117. 2020-01-28         0
##  8 Anhui            Mainland China    31.8  117. 2020-01-29         2
##  9 Anhui            Mainland China    31.8  117. 2020-01-30         2
## 10 Anhui            Mainland China    31.8  117. 2020-01-31         3
## # … with 3,280 more rows
death_long <- death %>%
  pivot_longer(-(`Province/State`:Long), 
               names_to = "Date", 
               values_to = "death") %>%
  mutate(Date = mdy(Date))
death_long
## # A tibble: 3,290 x 6
##    `Province/State` `Country/Region`   Lat  Long Date       death
##    <chr>            <chr>            <dbl> <dbl> <date>     <dbl>
##  1 Anhui            Mainland China    31.8  117. 2020-01-22     0
##  2 Anhui            Mainland China    31.8  117. 2020-01-23     0
##  3 Anhui            Mainland China    31.8  117. 2020-01-24     0
##  4 Anhui            Mainland China    31.8  117. 2020-01-25     0
##  5 Anhui            Mainland China    31.8  117. 2020-01-26     0
##  6 Anhui            Mainland China    31.8  117. 2020-01-27     0
##  7 Anhui            Mainland China    31.8  117. 2020-01-28     0
##  8 Anhui            Mainland China    31.8  117. 2020-01-29     0
##  9 Anhui            Mainland China    31.8  117. 2020-01-30     0
## 10 Anhui            Mainland China    31.8  117. 2020-01-31     0
## # … with 3,280 more rows
ncov_tbl <- confirmed_long %>%
  left_join(recovered_long) %>%
  left_join(death_long) %>%
  pivot_longer(confirmed:death, 
               names_to = "Case", 
               values_to = "Count")
## Joining, by = c("Province/State", "Country/Region", "Lat", "Long", "Date")
## Joining, by = c("Province/State", "Country/Region", "Lat", "Long", "Date")
ncov_tbl %>% print(width = Inf)
## # A tibble: 9,870 x 7
##    `Province/State` `Country/Region`   Lat  Long Date       Case      Count
##    <chr>            <chr>            <dbl> <dbl> <date>     <chr>     <dbl>
##  1 Anhui            Mainland China    31.8  117. 2020-01-22 confirmed     1
##  2 Anhui            Mainland China    31.8  117. 2020-01-22 recovered     0
##  3 Anhui            Mainland China    31.8  117. 2020-01-22 death         0
##  4 Anhui            Mainland China    31.8  117. 2020-01-23 confirmed     9
##  5 Anhui            Mainland China    31.8  117. 2020-01-23 recovered     0
##  6 Anhui            Mainland China    31.8  117. 2020-01-23 death         0
##  7 Anhui            Mainland China    31.8  117. 2020-01-24 confirmed    15
##  8 Anhui            Mainland China    31.8  117. 2020-01-24 recovered     0
##  9 Anhui            Mainland China    31.8  117. 2020-01-24 death         0
## 10 Anhui            Mainland China    31.8  117. 2020-01-25 confirmed    39
## # … with 9,860 more rows
# prepare data for shiny app
saveRDS(ncov_tbl, file = "../2019-20Coronavirus/ncov_tbl.rds")

Mapping China provinces

Use GIS data from China

A self note about installing sf package on CentOS 7: https://gist.github.com/Hua-Zhou/6c11babe35437ce1ea8e4893a14d07c8.

Donwload China GIS data from here, unzip, and put in the working directory.

ls -l bou2_4p.*
## -rw-r--r--. 1 xurui1996 xurui1996   86283 Feb 26 09:24 bou2_4p.dbf
## -rw-r--r--. 1 xurui1996 xurui1996 1508752 Feb 26 09:24 bou2_4p.shp
## -rw-r--r--. 1 xurui1996 xurui1996    7500 Feb 26 09:24 bou2_4p.shx

Read in the shape file into simple feature (SF) format. Replace the NA in NAME by Macau.

library(sf)
chn_map <- st_read("./bou2_4p.shp", as_tibble = TRUE) %>%
  mutate(NAME = iconv(NAME, from = "GBK"),
         BOU2_4M_ = as.integer(BOU2_4M_),
         BOU2_4M_ID = as.integer(BOU2_4M_ID)) %>%
  mutate(NAME = str_replace_na(NAME, replacement = "澳门特别行政区")) %>%
  print()
## Reading layer `bou2_4p' from data source `/home/xurui1996/biostat-203b-2020-winter/hw3/hw3sol/bou2_4p.shp' using driver `ESRI Shapefile'
## Simple feature collection with 925 features and 7 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 73.44696 ymin: 6.318641 xmax: 135.0858 ymax: 53.55793
## epsg (SRID):    NA
## proj4string:    NA
## Simple feature collection with 925 features and 7 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 73.44696 ymin: 6.318641 xmax: 135.0858 ymax: 53.55793
## epsg (SRID):    NA
## proj4string:    NA
## # A tibble: 925 x 8
##      AREA PERIMETER BOU2_4M_ BOU2_4M_ID ADCODE93 ADCODE99 NAME 
##  *  <dbl>     <dbl>    <int>      <int>    <int>    <int> <chr>
##  1  54.4     68.5          2         23   230000   230000 黑龙江省…
##  2 129.     130.           3         15   150000   150000 内蒙古自…
##  3 176.      84.9          4         65   650000   650000 新疆维吾…
##  4  21.3     41.2          5         22   220000   220000 吉林省…
##  5  15.6     38.4          6         21   210000   210000 辽宁省…
##  6  41.5     76.8          7         62   620000   620000 甘肃省…
##  7  19.5     44.9          8         13   130000   130000 河北省…
##  8   1.73     8.50         9         11   110000   110000 北京市…
##  9   0        0.088       10        292   210000   210000 辽宁省…
## 10   0        0.047       11        292   210000   210000 辽宁省…
## # … with 915 more rows, and 1 more variable: geometry <POLYGON>

There are about 34 provinces in China, why there are 925 areas?

chn_map %>% 
  count(NAME) %>% 
  print(n = Inf)
## Simple feature collection with 34 features and 2 fields
## geometry type:  GEOMETRY
## dimension:      XY
## bbox:           xmin: 73.44696 ymin: 6.318641 xmax: 135.0858 ymax: 53.55793
## epsg (SRID):    NA
## proj4string:    NA
## # A tibble: 34 x 3
##    NAME             n                                                   geometry
##  * <chr>        <int>                                                 <GEOMETRY>
##  1 上海市          12 MULTIPOLYGON (((121.414 30.6952, 121.415 30.69756, 121.41…
##  2 云南省           1 POLYGON ((98.65588 28.83075, 98.65788 28.8333, 98.66165 2…
##  3 内蒙古自治区     1 POLYGON ((121.4884 53.33265, 121.4974 53.32104, 121.5004 …
##  4 北京市           1 POLYGON ((117.496 40.66534, 117.4992 40.64039, 117.4938 4…
##  5 台湾省          57 MULTIPOLYGON (((120.8155 21.75681, 120.8174 21.75765, 120…
##  6 吉林省           1 POLYGON ((123.171 46.24668, 123.2186 46.26945, 123.2394 4…
##  7 四川省           1 POLYGON ((102.447 33.56613, 102.4386 33.57738, 102.4011 3…
##  8 天津市           1 POLYGON ((117.4638 40.24003, 117.4962 40.22717, 117.5416 …
##  9 宁夏回族自治区…     1 POLYGON ((106.8099 39.30777, 106.7962 39.30038, 106.8025 …
## 10 安徽省           1 POLYGON ((116.3686 34.64087, 116.4237 34.65374, 116.432 3…
## 11 山东省          86 MULTIPOLYGON (((120.7183 36.28086, 120.7147 36.28513, 120…
## 12 山西省           1 POLYGON ((114.1232 40.74159, 114.2001 40.65104, 114.1953 …
## 13 广东省         154 MULTIPOLYGON (((109.9309 20.22808, 109.9309 20.22664, 109…
## 14 广西壮族自治区…     6 MULTIPOLYGON (((109.2133 20.89708, 109.2109 20.89759, 109…
## 15 新疆维吾尔自治区…     1 POLYGON ((96.38329 42.72696, 96.35991 42.70969, 96.09664 …
## 16 江苏省           5 MULTIPOLYGON (((119.4393 34.77124, 119.4419 34.77665, 119…
## 17 江西省           1 POLYGON ((116.1283 29.82727, 116.1726 29.83686, 116.2121 …
## 18 河北省           9 MULTIPOLYGON (((117.194 40.08278, 117.1983 40.0796, 117.2…
## 19 河南省           1 POLYGON ((113.7235 36.35595, 113.7516 36.37056, 113.8081 …
## 20 浙江省         179 MULTIPOLYGON (((120.4666 27.16607, 120.468 27.16719, 120.…
## 21 海南省          79 MULTIPOLYGON (((113.2364 6.323439, 113.2353 6.32, 113.233…
## 22 湖北省           1 POLYGON ((109.757 32.91204, 109.7823 33.00224, 109.7863 3…
## 23 湖南省           1 POLYGON ((111.2695 30.01506, 111.2882 30.01272, 111.2936 …
## 24 澳门特别行政区…     1 POLYGON ((114.2259 22.54496, 114.226 22.54444, 114.2263 2…
## 25 甘肃省           1 POLYGON ((96.38329 42.72696, 96.38308 42.72533, 96.39072 …
## 26 福建省         168 MULTIPOLYGON (((119.9835 25.96752, 119.986 25.96805, 119.…
## 27 西藏自治区       1 POLYGON ((79.03262 34.33298, 79.07736 34.41547, 79.15108 …
## 28 贵州省           2 MULTIPOLYGON (((109.5375 26.75003, 109.571 26.76731, 109.…
## 29 辽宁省          94 MULTIPOLYGON (((121.7979 40.99618, 121.7931 40.98311, 121…
## 30 重庆市           1 POLYGON ((108.5145 32.21204, 108.5308 32.195, 108.5808 32…
## 31 陕西省           1 POLYGON ((111.1508 39.54331, 111.1104 39.50884, 111.1047 …
## 32 青海省           1 POLYGON ((96.98931 38.82063, 96.98743 38.8617, 96.95775 3…
## 33 香港特别行政区…    53 MULTIPOLYGON (((113.9175 22.15853, 113.9029 22.15769, 113…
## 34 黑龙江省         1 POLYGON ((121.4884 53.33265, 121.4995 53.33601, 121.5184 …

ggplot2 can plot SF data using geom_sf function:

chn_map %>%
  ggplot() + 
  geom_sf(mapping = aes(geometry = geometry), color = "black", fill = "white") + 
  #geom_sf_label(mapping = aes(label = NAME)) + 
  theme_bw() # better for maps 

These are the provinces available from the JHU sheets.

ncov_tbl %>%
  filter(`Country/Region` %in% c("Mainland China", "Macau", "Hong Kong", "Taiwan")) %>%
  distinct(`Province/State`, `Country/Region`) %>%
  print(n = Inf)
## # A tibble: 34 x 2
##    `Province/State` `Country/Region`
##    <chr>            <chr>           
##  1 Anhui            Mainland China  
##  2 Beijing          Mainland China  
##  3 Chongqing        Mainland China  
##  4 Fujian           Mainland China  
##  5 Gansu            Mainland China  
##  6 Guangdong        Mainland China  
##  7 Guangxi          Mainland China  
##  8 Guizhou          Mainland China  
##  9 Hainan           Mainland China  
## 10 Hebei            Mainland China  
## 11 Heilongjiang     Mainland China  
## 12 Henan            Mainland China  
## 13 Hubei            Mainland China  
## 14 Hunan            Mainland China  
## 15 Inner Mongolia   Mainland China  
## 16 Jiangsu          Mainland China  
## 17 Jiangxi          Mainland China  
## 18 Jilin            Mainland China  
## 19 Liaoning         Mainland China  
## 20 Ningxia          Mainland China  
## 21 Qinghai          Mainland China  
## 22 Shaanxi          Mainland China  
## 23 Shandong         Mainland China  
## 24 Shanghai         Mainland China  
## 25 Shanxi           Mainland China  
## 26 Sichuan          Mainland China  
## 27 Tianjin          Mainland China  
## 28 Tibet            Mainland China  
## 29 Xinjiang         Mainland China  
## 30 Yunnan           Mainland China  
## 31 Zhejiang         Mainland China  
## 32 Taiwan           Taiwan          
## 33 Macau            Macau           
## 34 Hong Kong        Hong Kong

In order to join the tibbles ncov_tbl and chn_map, we need to use the province name as key. Let’s create a function to translate Chinese province name to English.

translate <- function(x) {
  sapply(x, function(chn_name) {
    if (str_detect(chn_name, "澳门")) {
      eng_name <- "Macau"
    } else if (str_detect(chn_name, "台湾")) {
      eng_name <- "Taiwan"
    } else if (str_detect(chn_name, "上海")) {
      eng_name <- "Shanghai"
    } else if (str_detect(chn_name, "云南")) {
      eng_name <- "Yunnan"
    } else if (str_detect(chn_name, "内蒙古")) {
      eng_name <- "Inner Mongolia"
    } else if (str_detect(chn_name, "北京")) {
      eng_name <- "Beijing"
    } else if (str_detect(chn_name, "台湾")) {
      eng_name <- "Taiwan"
    } else if (str_detect(chn_name, "吉林")) {
      eng_name <- "Jilin"
    } else if (str_detect(chn_name, "四川")) {
      eng_name <- "Sichuan"
    } else if (str_detect(chn_name, "天津")) {
      eng_name <- "Tianjin"
    } else if (str_detect(chn_name, "宁夏")) {
      eng_name <- "Ningxia"
    } else if (str_detect(chn_name, "安徽")) {
      eng_name <- "Anhui"
    } else if (str_detect(chn_name, "山东")) {
      eng_name <- "Shandong"
    } else if (str_detect(chn_name, "山西")) {
      eng_name <- "Shanxi"
    } else if (str_detect(chn_name, "广东")) {
      eng_name <- "Guangdong"
    } else if (str_detect(chn_name, "广西")) {
      eng_name <- "Guangxi"
    } else if (str_detect(chn_name, "新疆")) {
      eng_name <- "Xinjiang"
    } else if (str_detect(chn_name, "江苏")) {
      eng_name <- "Jiangsu"
    } else if (str_detect(chn_name, "江西")) {
      eng_name <- "Jiangxi"
    } else if (str_detect(chn_name, "河北")) {
      eng_name <- "Hebei"
    } else if (str_detect(chn_name, "河南")) {
      eng_name <- "Henan"
    } else if (str_detect(chn_name, "浙江")) {
      eng_name <- "Zhejiang"
    } else if (str_detect(chn_name, "海南")) {
      eng_name <- "Hainan"
    } else if (str_detect(chn_name, "湖北")) {
      eng_name <- "Hubei"
    } else if (str_detect(chn_name, "湖南")) {
      eng_name <- "Hunan"
    } else if (str_detect(chn_name, "甘肃")) {
      eng_name <- "Gansu"
    } else if (str_detect(chn_name, "福建")) {
      eng_name <- "Fujian"
    } else if (str_detect(chn_name, "西藏")) {
      eng_name <- "Tibet"
    } else if (str_detect(chn_name, "贵州")) {
      eng_name <- "Guizhou"
    } else if (str_detect(chn_name, "辽宁")) {
      eng_name <- "Liaoning"
    } else if (str_detect(chn_name, "重庆")) {
      eng_name <- "Chongqing"
    } else if (str_detect(chn_name, "陕西")) {
      eng_name <- "Shanxi"
    } else if (str_detect(chn_name, "青海")) {
      eng_name <- "Qinghai"
    } else if (str_detect(chn_name, "香港")) {
      eng_name <- "Hong Kong"
    } else if (str_detect(chn_name, "黑龙江")) {
      eng_name <- "Heilongjiang"
    } else {
      eng_name <- chn_name # don't translate if no correspondence
    }
    return(eng_name)
  })
}

Create a new variable NAME_ENG:

chn_prov <- chn_map %>% 
  count(NAME) %>%
  mutate(NAME_ENG = translate(NAME)) # translate function is vectorized
chn_prov %>% print(n = Inf)
## Simple feature collection with 34 features and 3 fields
## geometry type:  GEOMETRY
## dimension:      XY
## bbox:           xmin: 73.44696 ymin: 6.318641 xmax: 135.0858 ymax: 53.55793
## epsg (SRID):    NA
## proj4string:    NA
## # A tibble: 34 x 4
##    NAME           n                                          geometry NAME_ENG  
##  * <chr>      <int>                                        <GEOMETRY> <chr>     
##  1 上海市        12 MULTIPOLYGON (((121.414 30.6952, 121.415 30.6975… Shanghai  
##  2 云南省         1 POLYGON ((98.65588 28.83075, 98.65788 28.8333, 9… Yunnan    
##  3 内蒙古自治区…     1 POLYGON ((121.4884 53.33265, 121.4974 53.32104, … Inner Mon…
##  4 北京市         1 POLYGON ((117.496 40.66534, 117.4992 40.64039, 1… Beijing   
##  5 台湾省        57 MULTIPOLYGON (((120.8155 21.75681, 120.8174 21.7… Taiwan    
##  6 吉林省         1 POLYGON ((123.171 46.24668, 123.2186 46.26945, 1… Jilin     
##  7 四川省         1 POLYGON ((102.447 33.56613, 102.4386 33.57738, 1… Sichuan   
##  8 天津市         1 POLYGON ((117.4638 40.24003, 117.4962 40.22717, … Tianjin   
##  9 宁夏回族自治区…     1 POLYGON ((106.8099 39.30777, 106.7962 39.30038, … Ningxia   
## 10 安徽省         1 POLYGON ((116.3686 34.64087, 116.4237 34.65374, … Anhui     
## 11 山东省        86 MULTIPOLYGON (((120.7183 36.28086, 120.7147 36.2… Shandong  
## 12 山西省         1 POLYGON ((114.1232 40.74159, 114.2001 40.65104, … Shanxi    
## 13 广东省       154 MULTIPOLYGON (((109.9309 20.22808, 109.9309 20.2… Guangdong 
## 14 广西壮族自治区…     6 MULTIPOLYGON (((109.2133 20.89708, 109.2109 20.8… Guangxi   
## 15 新疆维吾尔自治区…     1 POLYGON ((96.38329 42.72696, 96.35991 42.70969, … Xinjiang  
## 16 江苏省         5 MULTIPOLYGON (((119.4393 34.77124, 119.4419 34.7… Jiangsu   
## 17 江西省         1 POLYGON ((116.1283 29.82727, 116.1726 29.83686, … Jiangxi   
## 18 河北省         9 MULTIPOLYGON (((117.194 40.08278, 117.1983 40.07… Hebei     
## 19 河南省         1 POLYGON ((113.7235 36.35595, 113.7516 36.37056, … Henan     
## 20 浙江省       179 MULTIPOLYGON (((120.4666 27.16607, 120.468 27.16… Zhejiang  
## 21 海南省        79 MULTIPOLYGON (((113.2364 6.323439, 113.2353 6.32… Hainan    
## 22 湖北省         1 POLYGON ((109.757 32.91204, 109.7823 33.00224, 1… Hubei     
## 23 湖南省         1 POLYGON ((111.2695 30.01506, 111.2882 30.01272, … Hunan     
## 24 澳门特别行政区…     1 POLYGON ((114.2259 22.54496, 114.226 22.54444, 1… Macau     
## 25 甘肃省         1 POLYGON ((96.38329 42.72696, 96.38308 42.72533, … Gansu     
## 26 福建省       168 MULTIPOLYGON (((119.9835 25.96752, 119.986 25.96… Fujian    
## 27 西藏自治区     1 POLYGON ((79.03262 34.33298, 79.07736 34.41547, … Tibet     
## 28 贵州省         2 MULTIPOLYGON (((109.5375 26.75003, 109.571 26.76… Guizhou   
## 29 辽宁省        94 MULTIPOLYGON (((121.7979 40.99618, 121.7931 40.9… Liaoning  
## 30 重庆市         1 POLYGON ((108.5145 32.21204, 108.5308 32.195, 10… Chongqing 
## 31 陕西省         1 POLYGON ((111.1508 39.54331, 111.1104 39.50884, … Shanxi    
## 32 青海省         1 POLYGON ((96.98931 38.82063, 96.98743 38.8617, 9… Qinghai   
## 33 香港特别行政区…    53 MULTIPOLYGON (((113.9175 22.15853, 113.9029 22.1… Hong Kong 
## 34 黑龙江省       1 POLYGON ((121.4884 53.33265, 121.4995 53.33601, … Heilongji…
# prepare data for shiny app
saveRDS(chn_prov, file = "../2019-20Coronavirus/chn_prov.rds")

Plotting 2019-nCoV incidence

Try to join the virus data ncov_tbl and map data chn_prov.

# for exploration
plotdate <- "2020-02-14"
case <- "confirmed"
ncov_tbl %>%
  filter(`Country/Region` %in% c("Mainland China", "Macau", "Hong Kong", "Taiwan")) %>%
  filter(Date == plotdate, Case == case) %>%
  group_by(`Province/State`) %>%  
  top_n(1, Date) %>%
  right_join(chn_prov, by = c("Province/State" = "NAME_ENG")) # join map and virus data
## Warning: Column `Province/State`/`NAME_ENG` has different attributes on LHS and
## RHS of join
## # A tibble: 34 x 10
## # Groups:   Province/State [33]
##    `Province/State` `Country/Region`   Lat  Long Date       Case  Count NAME 
##    <chr>            <chr>            <dbl> <dbl> <date>     <chr> <dbl> <chr>
##  1 Shanghai         Mainland China    31.2  121. 2020-02-14 conf…   318 上海市…
##  2 Yunnan           Mainland China    25.0  101. 2020-02-14 conf…   162 云南省…
##  3 Inner Mongolia   Mainland China    44.1  114. 2020-02-14 conf…    65 内蒙古自…
##  4 Beijing          Mainland China    40.2  116. 2020-02-14 conf…   372 北京市…
##  5 Taiwan           Taiwan            23.7  121. 2020-02-14 conf…    18 台湾省…
##  6 Jilin            Mainland China    43.7  126. 2020-02-14 conf…    86 吉林省…
##  7 Sichuan          Mainland China    30.6  103. 2020-02-14 conf…   463 四川省…
##  8 Tianjin          Mainland China    39.3  117. 2020-02-14 conf…   120 天津市…
##  9 Ningxia          Mainland China    37.3  106. 2020-02-14 conf…    67 宁夏回族…
## 10 Anhui            Mainland China    31.8  117. 2020-02-14 conf…   934 安徽省…
## # … with 24 more rows, and 2 more variables: n <int>, geometry <GEOMETRY>

Plot confirmed cases on a specific date:

plotdate <- "2020-02-14"
case <- "confirmed"

ncov_tbl %>%
  filter(`Country/Region` %in% c("Mainland China", "Macau", "Hong Kong", "Taiwan")) %>%
  filter(Date == plotdate, Case == case) %>%
  group_by(`Province/State`) %>%  
  top_n(1, Date) %>% # take the latest count on that date
  right_join(chn_prov, by = c("Province/State" = "NAME_ENG")) %>%
  ggplot() +
  geom_sf(mapping = aes(fill = Count, geometry = geometry)) +
  # scale_fill_gradient(low = "white",
  #                     high = "red",
  #                     trans = "log10",
  #                     limits = c(1, 50000),
  #                     breaks = c(1, 10, 100, 1000, 10000),
  #                     name = "") +
  scale_fill_gradientn(colors = wes_palette("Zissou1", 100, type = "continuous"),
                       trans = "log10") + # can we find a better palette?
  # #scale_fill_brewer(palette = "Dark2") + 
  theme_bw() +
  labs(title = str_c(case, " cases"), subtitle = plotdate)
## Warning: Column `Province/State`/`NAME_ENG` has different attributes on LHS and
## RHS of join

To plot the line graph of confirmed cases over time:

ncov_tbl %>%
  filter(`Country/Region` %in% c("Mainland China", "Macau", "Hong Kong", "Taiwan")) %>%
  group_by(Date, Case) %>%  
  summarise(total_count = sum(Count)) %>%
  # print()
  ggplot() +
  geom_line(mapping = aes(x = Date, y = total_count, color = Case)) +
  scale_color_manual(values = c("blue", "black", "green")) + 
  scale_y_log10() +
  labs(y = "Count") + 
  theme_bw()

Animation

Resources about making animations in R:
- gganimate package.
- Section 8.3 of Geomcomputation with R.

ncov_tbl %>%
  filter(`Country/Region` %in% c("Mainland China", "Macau", "Hong Kong", "Taiwan")) %>%
  filter(Case == case) %>%
  right_join(chn_prov, by = c("Province/State" = "NAME_ENG")) %>%
  print()
## Warning: Column `Province/State`/`NAME_ENG` has different attributes on LHS and
## RHS of join
## # A tibble: 1,190 x 10
##    `Province/State` `Country/Region`   Lat  Long Date       Case  Count NAME 
##    <chr>            <chr>            <dbl> <dbl> <date>     <chr> <dbl> <chr>
##  1 Shanghai         Mainland China    31.2  121. 2020-01-22 conf…     9 上海市…
##  2 Shanghai         Mainland China    31.2  121. 2020-01-23 conf…    16 上海市…
##  3 Shanghai         Mainland China    31.2  121. 2020-01-24 conf…    20 上海市…
##  4 Shanghai         Mainland China    31.2  121. 2020-01-25 conf…    33 上海市…
##  5 Shanghai         Mainland China    31.2  121. 2020-01-26 conf…    40 上海市…
##  6 Shanghai         Mainland China    31.2  121. 2020-01-27 conf…    53 上海市…
##  7 Shanghai         Mainland China    31.2  121. 2020-01-28 conf…    66 上海市…
##  8 Shanghai         Mainland China    31.2  121. 2020-01-29 conf…    96 上海市…
##  9 Shanghai         Mainland China    31.2  121. 2020-01-30 conf…   112 上海市…
## 10 Shanghai         Mainland China    31.2  121. 2020-01-31 conf…   135 上海市…
## # … with 1,180 more rows, and 2 more variables: n <int>,
## #   geometry <MULTIPOLYGON>

Plot the date at all time points (this takes long, a couple minutes):

library(gganimate)
library(transformr)
## 
## Attaching package: 'transformr'
## The following object is masked from 'package:sf':
## 
##     st_normalize
case = "confirmed"

(p <- ncov_tbl %>%  
  filter(`Country/Region` %in% c("Mainland China", "Macau", "Hong Kong", "Taiwan")) %>%
  filter(Case == case) %>%
  right_join(chn_prov, by = c("Province/State" = "NAME_ENG")) %>%
  ggplot() + 
  geom_sf(mapping = aes(fill = Count, geometry = geometry)) + 
  # scale_fill_gradient(low = "white",
  #                     high = "red",
  #                     trans = "log10",
  #                     limits = c(1, 100000),
  #                     breaks = c(1, 10, 100, 1000, 10000),
  #                     name = "") +
  scale_fill_gradientn(colours = wes_palette("Zissou1", 100, type = "continuous"),
                       trans = "log10") + 
  theme_bw() +
  labs(title = str_c(case, " cases")))
## Warning: Column `Province/State`/`NAME_ENG` has different attributes on LHS and
## RHS of join
## Warning: Transformation introduced infinite values in discrete y-axis

Make animation and save as gif (this takes long, a couple minutes)

(anim <- p + 
  transition_time(Date) + 
  labs(title = "Confirmed Cases", subtitle = "Date: {frame_time}"))
## Warning: Transformation introduced infinite values in discrete y-axis
animate(anim, renderer = gifski_renderer())
## Warning: Transformation introduced infinite values in discrete y-axis

anim_save("confirmed_anim.gif")

Impact on economy

library(quantmod)
## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Attaching package: 'xts'
## The following objects are masked from 'package:dplyr':
## 
##     first, last
## Loading required package: TTR
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## Version 0.4-0 included new data defaults. See ?getSymbols.
stock <- getSymbols("^HSI", # S&P 500 (^GSPC), Dow Jones (^DJI), NASDAQ (^IXIC), Russell 2000 (^RUT), FTSE 100 (^FTSE), Nikkei 225 (^N225), HANG SENG INDEX (^HSI)
                    src = "yahoo", 
                    auto.assign = FALSE, 
                    from = min(ncov_tbl$Date),
                    to = max(ncov_tbl$Date)) %>% 
  as_tibble(rownames = "Date") %>%
  mutate(Date = date(Date)) %>%
  ggplot() + 
  geom_line(mapping = aes(x = Date, y = HSI.Adjusted)) +
  theme_bw()
## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
## 
## This message is shown once per session and may be disabled by setting 
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.
## Warning: 'indexClass<-' is deprecated.
## Use 'tclass<-' instead.
## See help("Deprecated") and help("xts-deprecated").
## Warning: Calling `as_tibble()` on a vector is discouraged, because the behavior is likely to change in the future. Use `tibble::enframe(name = NULL)` instead.
## This warning is displayed once per session.
stock

# chartSeries(stock, theme = chartTheme("white"),
#            type = "line", log.scale = FALSE, TA = NULL)

Mapping countries

countries <- map_data("world") %>% as_tibble()
head(countries, 20)
## # A tibble: 20 x 6
##     long   lat group order region      subregion
##    <dbl> <dbl> <dbl> <int> <chr>       <chr>    
##  1 -69.9  12.5     1     1 Aruba       <NA>     
##  2 -69.9  12.4     1     2 Aruba       <NA>     
##  3 -69.9  12.4     1     3 Aruba       <NA>     
##  4 -70.0  12.5     1     4 Aruba       <NA>     
##  5 -70.1  12.5     1     5 Aruba       <NA>     
##  6 -70.1  12.6     1     6 Aruba       <NA>     
##  7 -70.0  12.6     1     7 Aruba       <NA>     
##  8 -70.0  12.6     1     8 Aruba       <NA>     
##  9 -69.9  12.5     1     9 Aruba       <NA>     
## 10 -69.9  12.5     1    10 Aruba       <NA>     
## 11  74.9  37.2     2    12 Afghanistan <NA>     
## 12  74.8  37.2     2    13 Afghanistan <NA>     
## 13  74.8  37.2     2    14 Afghanistan <NA>     
## 14  74.7  37.3     2    15 Afghanistan <NA>     
## 15  74.7  37.3     2    16 Afghanistan <NA>     
## 16  74.7  37.3     2    17 Afghanistan <NA>     
## 17  74.6  37.2     2    18 Afghanistan <NA>     
## 18  74.4  37.2     2    19 Afghanistan <NA>     
## 19  74.4  37.1     2    20 Afghanistan <NA>     
## 20  74.5  37.1     2    21 Afghanistan <NA>
translate2 <- function(x) {
  sapply(x, function(name) {
    if (str_detect(name, "China")) {
      # note here the Mainland China represents the whole China
      new_name <- "Mainland China"
    } else if (str_detect(name, "Taiwan")) {
      new_name <- "Mainland China"
    } else if (str_detect(name, "USA")) {
      new_name <- "US"
    } else {
      new_name <- name
    }
  })
}

countries <- countries %>% 
  mutate(new_name = translate2(`region`))
countries
## # A tibble: 99,338 x 7
##     long   lat group order region subregion new_name
##    <dbl> <dbl> <dbl> <int> <chr>  <chr>     <chr>   
##  1 -69.9  12.5     1     1 Aruba  <NA>      Aruba   
##  2 -69.9  12.4     1     2 Aruba  <NA>      Aruba   
##  3 -69.9  12.4     1     3 Aruba  <NA>      Aruba   
##  4 -70.0  12.5     1     4 Aruba  <NA>      Aruba   
##  5 -70.1  12.5     1     5 Aruba  <NA>      Aruba   
##  6 -70.1  12.6     1     6 Aruba  <NA>      Aruba   
##  7 -70.0  12.6     1     7 Aruba  <NA>      Aruba   
##  8 -70.0  12.6     1     8 Aruba  <NA>      Aruba   
##  9 -69.9  12.5     1     9 Aruba  <NA>      Aruba   
## 10 -69.9  12.5     1    10 Aruba  <NA>      Aruba   
## # … with 99,328 more rows
# prepare data for shiny app
saveRDS(countries, file = "../2019-20Coronavirus/countries.rds")
plotdate <- "2020-02-14"
case <- "confirmed"

(info <- ncov_tbl %>%
  filter(Date == plotdate, Case == case) %>%
  group_by(`Country/Region`) %>%  
  top_n(1, Date)) # take the latest count on that date
## # A tibble: 94 x 7
## # Groups:   Country/Region [42]
##    `Province/State` `Country/Region`   Lat  Long Date       Case      Count
##    <chr>            <chr>            <dbl> <dbl> <date>     <chr>     <dbl>
##  1 Anhui            Mainland China    31.8  117. 2020-02-14 confirmed   934
##  2 Beijing          Mainland China    40.2  116. 2020-02-14 confirmed   372
##  3 Chongqing        Mainland China    30.1  108. 2020-02-14 confirmed   537
##  4 Fujian           Mainland China    26.1  118. 2020-02-14 confirmed   281
##  5 Gansu            Mainland China    36.1  104. 2020-02-14 confirmed    90
##  6 Guangdong        Mainland China    23.3  113. 2020-02-14 confirmed  1261
##  7 Guangxi          Mainland China    23.8  109. 2020-02-14 confirmed   226
##  8 Guizhou          Mainland China    26.8  107. 2020-02-14 confirmed   140
##  9 Hainan           Mainland China    19.2  110. 2020-02-14 confirmed   159
## 10 Hebei            Mainland China    38.0  115. 2020-02-14 confirmed   283
## # … with 84 more rows
China <- info %>%
  filter(`Country/Region` %in% c("Mainland China", "Macau", 
                                 "Hong Kong", "Taiwan")) %>%
  group_by(Case) %>%
  mutate(tmp = sum(Count)) %>% 
  top_n(1, `Province/State`) %>%
  select(-`Count`)
colnames(China)[7] <- "Count"
China
## # A tibble: 1 x 7
## # Groups:   Case [1]
##   `Province/State` `Country/Region`   Lat  Long Date       Case      Count
##   <chr>            <chr>            <dbl> <dbl> <date>     <chr>     <dbl>
## 1 Zhejiang         Mainland China    29.2  120. 2020-02-14 confirmed 66376
Other <- info %>% 
  filter(`Country/Region` != "Mainland China", `Country/Region` != "Macau", 
         `Country/Region` != "Hong Kong", `Country/Region` != "Taiwan",
         `Country/Region` != "Others") %>%
  mutate(tmp = sum(Count)) %>%
  top_n(1, `Province/State`) %>%
  select(- `Count`)
colnames(Other)[7] <- "Count"
Other %>% print(n = Inf)
## # A tibble: 5 x 7
## # Groups:   Country/Region [5]
##   `Province/State`          `Country/Region`   Lat  Long Date       Case   Count
##   <chr>                     <chr>            <dbl> <dbl> <date>     <chr>  <dbl>
## 1 Toronto, ON               Canada            43.7 -79.4 2020-02-14 confi…     7
## 2 Victoria                  Australia        -37.8 145.  2020-02-14 confi…    15
## 3 From Diamond Princess     Israel            31.0  34.9 2020-02-14 confi…     0
## 4 Unassigned Location (Fro… US                37.1 -95.7 2020-02-14 confi…    15
## 5 None                      Austria           47.5  14.6 2020-02-14 confi…     0
newInfo <- bind_rows(China, Other) %>%
  print(n = Inf)
## # A tibble: 6 x 7
## # Groups:   Case [1]
##   `Province/State`          `Country/Region`   Lat  Long Date       Case   Count
##   <chr>                     <chr>            <dbl> <dbl> <date>     <chr>  <dbl>
## 1 Zhejiang                  Mainland China    29.2 120.  2020-02-14 confi… 66376
## 2 Toronto, ON               Canada            43.7 -79.4 2020-02-14 confi…     7
## 3 Victoria                  Australia        -37.8 145.  2020-02-14 confi…    15
## 4 From Diamond Princess     Israel            31.0  34.9 2020-02-14 confi…     0
## 5 Unassigned Location (Fro… US                37.1 -95.7 2020-02-14 confi…    15
## 6 None                      Austria           47.5  14.6 2020-02-14 confi…     0

Try to join the virus data newInfo and map data countries.

map <- newInfo %>% 
  left_join(countries, by = c("Country/Region" = "new_name"))
## Warning: Column `Country/Region`/`new_name` has different attributes on LHS and
## RHS of join

Plot confirmed cases on a specific date:

ggplot(map, aes(long, lat, group = group, color = Count)) + 
  geom_polygon(aes(fill = Count)) + 
  # scale_fill_gradientn(colours = wes_palette("Zissou1", 100, 
  #                     type =   "continuous"), trans = "log10") + 
  scale_fill_gradientn(
    colours = c("#461863", "#404E88", "#2A8A8C", "#7FD157", "#F9E53F"), 
    values = scales::rescale(c(5, 10, 20, 100, 100000)), 
    labels = scales::comma, 
    breaks = c(5, 10, 20, 100, 100000)
  ) +
  guides(fill = guide_legend(reverse = T)) +
  theme_bw() +
  labs(title = str_c(case, " cases")) +
  coord_quickmap()

To plot the line graph of confirmed cases over time.

ncov_tbl %>%
  group_by(Date, Case) %>%  
  summarise(total_count = sum(Count)) %>%
  # print()
  ggplot() +
  geom_line(mapping = aes(x = Date, y = total_count, color = Case), size = 2) + 
  scale_color_manual(values = c("blue", "black", "green")) + 
  scale_y_log10() + 
  labs(y = "Count") + 
  theme_bw()